501 research outputs found

    Theoretical Properties of the Overlapping Groups Lasso

    Full text link
    We present two sets of theoretical results on the grouped lasso with overlap of Jacob, Obozinski and Vert (2009) in the linear regression setting. This method allows for joint selection of predictors in sparse regression, allowing for complex structured sparsity over the predictors encoded as a set of groups. This flexible framework suggests that arbitrarily complex structures can be encoded with an intricate set of groups. Our results show that this strategy results in unexpected theoretical consequences for the procedure. In particular, we give two sets of results: (1) finite sample bounds on prediction and estimation, and (2) asymptotic distribution and selection. Both sets of results give insight into the consequences of choosing an increasingly complex set of groups for the procedure, as well as what happens when the set of groups cannot recover the true sparsity pattern. Additionally, these results demonstrate the differences and similarities between the the grouped lasso procedure with and without overlapping groups. Our analysis shows the set of groups must be chosen with caution - an overly complex set of groups will damage the analysis.Comment: 20 pages, submitted to Annals of Statistic

    Entropy balancing is doubly robust

    Full text link
    Covariate balance is a conventional key diagnostic for methods used estimating causal effects from observational studies. Recently, there is an emerging interest in directly incorporating covariate balance in the estimation. We study a recently proposed entropy maximization method called Entropy Balancing (EB), which exactly matches the covariate moments for the different experimental groups in its optimization problem. We show EB is doubly robust with respect to linear outcome regression and logistic propensity score regression, and it reaches the asymptotic semiparametric variance bound when both regressions are correctly specified. This is surprising to us because there is no attempt to model the outcome or the treatment assignment in the original proposal of EB. Our theoretical results and simulations suggest that EB is a very appealing alternative to the conventional weighting estimators that estimate the propensity score by maximum likelihood.Comment: 23 pages, 6 figures, Journal of Causal Inference 201

    Structured, sparse regression with application to HIV drug resistance

    Full text link
    We introduce a new version of forward stepwise regression. Our modification finds solutions to regression problems where the selected predictors appear in a structured pattern, with respect to a predefined distance measure over the candidate predictors. Our method is motivated by the problem of predicting HIV-1 drug resistance from protein sequences. We find that our method improves the interpretability of drug resistance while producing comparable predictive accuracy to standard methods. We also demonstrate our method in a simulation study and present some theoretical results and connections.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS428 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey: measuring structure growth using passive galaxies

    Get PDF
    We explore the benefits of using a passively evolving population of galaxies to measure the evolution of the rate of structure growth between z=0.25 and z=0.65 by combining data from the SDSS-I/II and SDSS-III surveys. The large-scale linear bias of a population of dynamically passive galaxies, which we select from both surveys, is easily modeled. Knowing the bias evolution breaks degeneracies inherent to other methodologies, and decreases the uncertainty in measurements of the rate of structure growth and the normalization of the galaxy power-spectrum by up to a factor of two. If we translate our measurements into a constraint on sigma_8(z=0) assuming a concordance cosmological model and General Relativity (GR), we find that using a bias model improves our uncertainty by a factor of nearly 1.5. Our results are consistent with a flat Lambda Cold Dark Matter model and with GR.Comment: Accepted for publication in MNRAS (clarifications added, results and conclusions unchanged

    Detection of Baryon Acoustic Oscillation Features in the Large-Scale 3-Point Correlation Function of SDSS BOSS DR12 CMASS Galaxies

    Full text link
    We present the large-scale 3-point correlation function (3PCF) of the SDSS DR12 CMASS sample of 777,202777,202 Luminous Red Galaxies, the largest-ever sample used for a 3PCF or bispectrum measurement. We make the first high-significance (4.5σ4.5\sigma) detection of Baryon Acoustic Oscillations (BAO) in the 3PCF. Using these acoustic features in the 3PCF as a standard ruler, we measure the distance to z=0.57z=0.57 to 1.7%1.7\% precision (statistical plus systematic). We find DV=2024±29  Mpc  (stat)±20  Mpc  (sys)D_{\rm V}= 2024\pm29\;{\rm Mpc\;(stat)}\pm20\;{\rm Mpc\;(sys)} for our fiducial cosmology (consistent with Planck 2015) and bias model. This measurement extends the use of the BAO technique from the 2-point correlation function (2PCF) and power spectrum to the 3PCF and opens an avenue for deriving additional cosmological distance information from future large-scale structure redshift surveys such as DESI. Our measured distance scale from the 3PCF is fairly independent from that derived from the pre-reconstruction 2PCF and is equivalent to increasing the length of BOSS by roughly 10\%; reconstruction appears to lower the independence of the distance measurements. Fitting a model including tidal tensor bias yields a moderate significance (2.6σ)2.6\sigma) detection of this bias with a value in agreement with the prediction from local Lagrangian biasing.Comment: 15 pages, 7 figures, submitted MNRA

    Baryon Acoustic Oscillations in the Sloan Digital Sky Survey Data Release 7 Galaxy Sample

    Get PDF
    The spectroscopic Sloan Digital Sky Survey (SDSS) Data Release 7 (DR7) galaxy sample represents the final set of galaxies observed using the original SDSS target selection criteria. We analyse the clustering of galaxies within this sample, including both the Luminous Red Galaxy (LRG) and Main samples, and also include the 2-degree Field Galaxy Redshift Survey (2dFGRS) data. Baryon Acoustic Oscillations are observed in power spectra measured for different slices in redshift; this allows us to constrain the distance--redshift relation at multiple epochs. We achieve a distance measure at redshift z=0.275, of r_s(z_d)/D_V(0.275)=0.1390+/-0.0037 (2.7% accuracy), where r_s(z_d) is the comoving sound horizon at the baryon drag epoch, D_V(z)=[(1+z)^2D_A^2cz/H(z)]^(1/3), D_A(z) is the angular diameter distance and H(z) is the Hubble parameter. We find an almost independent constraint on the ratio of distances D_V(0.35)/D_V(0.2)=1.736+/-0.065, which is consistent at the 1.1sigma level with the best fit Lambda-CDM model obtained when combining our z=0.275 distance constraint with the WMAP 5-year data. The offset is similar to that found in previous analyses of the SDSS DR5 sample, but the discrepancy is now of lower significance, a change caused by a revised error analysis and a change in the methodology adopted, as well as the addition of more data. Using WMAP5 constraints on Omega_bh^2 and Omega_ch^2, and combining our BAO distance measurements with those from the Union Supernova sample, places a tight constraint on Omega_m=0.286+/-0.018 and H_0 = 68.2+/-2.2km/s/Mpc that is robust to allowing curvature and non-Lambda dark energy. This result is independent of the behaviour of dark energy at redshifts greater than those probed by the BAO and supernova measurements. (abridged)Comment: 22 pages, 16 figures, minor changes to match version published in MNRA
    corecore